Word vectors, reuse, and replicability: Towards a community repository of large-text resources

نویسندگان

  • Murhaf Fares
  • Andrey Kutuzov
  • Stephan Oepen
  • Erik Velldal
چکیده

This paper describes an emerging shared repository of large-text resources for creating word vectors, including pre-processed corpora and pre-trained vectors for a range of frameworks and configurations. This will facilitate reuse, rapid experimentation, and replicability of results.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Comparing methods for automatic acquisition of Topic Signatures

The main goal of this work is to compare two methods for building Topic Signatures, which are vectors of weighted words acquired from large corpora. We used two different software tools, ExRetriever and Infomap, for acquiring Topic Signatures from corpus. Using these tools, we retrieve sense examples from large text collections. Both systems construct a query for each word sense using WordNet. ...

متن کامل

The Addgene repository: an international nonprofit plasmid and data resource

The Addgene Repository (http://www.addgene.org) was founded to accelerate research and discovery by improving access to useful, high-quality research materials and information. The repository archives plasmids generated by scientists, conducts quality control, annotates the associated data and makes the plasmids and their data available to the scientific community. Plasmid associated data under...

متن کامل

Towards Distributed Learning Organizational Memories

This paper presents an analyze of a learning organizational memory and some disadvantages that such a centralized application contains. One issue is the reuse of a prototype giving access to learning resources, outside the university where it was created, by teachers of the same domain, even if they are very interested to do so. One attempt for resolving this problem is to integrate distributed...

متن کامل

AHDS Digital Repository

The Arts and Humanities Data Service (AHDS) was established in 1996 to collect, preserve and encouraging the reuse of digital resources created during scholarly research in the arts and humanities. The AHDS is now responsible for the preservation of over 3,000 digital resources and holds a wide range of data types, from plain text and image files to datasets (spreadsheets, databases, statistica...

متن کامل

HESA: The Construction and Evaluation of Hierarchical Software Feature Repository

Nowadays, the demand for software resources on different granularity is becoming prominent in software engineering field. However, a large quantity of heterogeneous software resources have not been organized in a reasonable and efficient way. Software features, a kind of important knowledge for software reuse, are ideal materials to characterize software resources. Our preliminary study shows t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017